doi:10.1371/journal.pone.0119491 There is eight groups of mice (7–10 per group): trisomic and controls (Genotype variable), CS mice injected with saline or with memantine (Treatment variable) and SC mice injected with saline or with memantine, as described.CS - this is the context-shock (CS) group, SC - the shock-context group (Behavior variable).
“Context fear conditioning (CFC) was performed as described [16,39,42]. Briefly, mice were placed in a novel cage (Med Associates, St. Albans, VT, Modular Mouse Test Chamber), allowed to explore for three minutes and then given an electric shock (2 s, 0.7 mA, constant electric current). These mice are the context-shock (CS) group and learn to associate the context with the aversive stimulus. These mice are the shock-context (SC) group and do not acquire conditioned fear.”
Data loading:
data_mice <- read.csv('Data_Cortex_Nuclear.csv')
str(data_mice)
## 'data.frame': 1080 obs. of 82 variables:
## $ MouseID : Factor w/ 1080 levels "18899_1","18899_10",..: 46 53 54 55 56 57 58 59 60 47 ...
## $ DYRK1A_N : num 0.504 0.515 0.509 0.442 0.435 ...
## $ ITSN1_N : num 0.747 0.689 0.73 0.617 0.617 ...
## $ BDNF_N : num 0.43 0.412 0.418 0.359 0.359 ...
## $ NR1_N : num 2.82 2.79 2.69 2.47 2.37 ...
## $ NR2A_N : num 5.99 5.69 5.62 4.98 4.72 ...
## $ pAKT_N : num 0.219 0.212 0.209 0.223 0.213 ...
## $ pBRAF_N : num 0.178 0.173 0.176 0.176 0.174 ...
## $ pCAMKII_N : num 2.37 2.29 2.28 2.15 2.13 ...
## $ pCREB_N : num 0.232 0.227 0.23 0.207 0.192 ...
## $ pELK_N : num 1.75 1.6 1.56 1.6 1.5 ...
## $ pERK_N : num 0.688 0.695 0.677 0.583 0.551 ...
## $ pJNK_N : num 0.306 0.299 0.291 0.297 0.287 ...
## $ PKCA_N : num 0.403 0.386 0.381 0.377 0.364 ...
## $ pMEK_N : num 0.297 0.281 0.282 0.314 0.278 ...
## $ pNR1_N : num 1.022 0.957 1.004 0.875 0.865 ...
## $ pNR2A_N : num 0.606 0.588 0.602 0.52 0.508 ...
## $ pNR2B_N : num 1.88 1.73 1.73 1.57 1.48 ...
## $ pPKCAB_N : num 2.31 2.04 2.02 2.13 2.01 ...
## $ pRSK_N : num 0.442 0.445 0.468 0.478 0.483 ...
## $ AKT_N : num 0.859 0.835 0.814 0.728 0.688 ...
## $ BRAF_N : num 0.416 0.4 0.4 0.386 0.368 ...
## $ CAMKII_N : num 0.37 0.356 0.368 0.363 0.355 ...
## $ CREB_N : num 0.179 0.174 0.174 0.179 0.175 ...
## $ ELK_N : num 1.87 1.76 1.77 1.29 1.32 ...
## $ ERK_N : num 3.69 3.49 3.57 2.97 2.9 ...
## $ GSK3B_N : num 1.54 1.51 1.5 1.42 1.36 ...
## $ JNK_N : num 0.265 0.256 0.26 0.26 0.251 ...
## $ MEK_N : num 0.32 0.304 0.312 0.279 0.274 ...
## $ TRKA_N : num 0.814 0.781 0.785 0.734 0.703 ...
## $ RSK_N : num 0.166 0.157 0.161 0.162 0.155 ...
## $ APP_N : num 0.454 0.431 0.423 0.411 0.399 ...
## $ Bcatenin_N : num 3.04 2.92 2.94 2.5 2.46 ...
## $ SOD1_N : num 0.37 0.342 0.344 0.345 0.329 ...
## $ MTOR_N : num 0.459 0.424 0.425 0.429 0.409 ...
## $ P38_N : num 0.335 0.325 0.325 0.33 0.313 ...
## $ pMTOR_N : num 0.825 0.762 0.757 0.747 0.692 ...
## $ DSCR1_N : num 0.577 0.545 0.544 0.547 0.537 ...
## $ AMPKA_N : num 0.448 0.421 0.405 0.387 0.361 ...
## $ NR2B_N : num 0.586 0.545 0.553 0.548 0.513 ...
## $ pNUMB_N : num 0.395 0.368 0.364 0.367 0.352 ...
## $ RAPTOR_N : num 0.34 0.322 0.313 0.328 0.312 ...
## $ TIAM1_N : num 0.483 0.455 0.447 0.443 0.419 ...
## $ pP70S6_N : num 0.294 0.276 0.257 0.399 0.393 ...
## $ NUMB_N : num 0.182 0.182 0.184 0.162 0.16 ...
## $ P70S6_N : num 0.843 0.848 0.856 0.76 0.768 ...
## $ pGSK3B_N : num 0.193 0.195 0.201 0.184 0.186 ...
## $ pPKCG_N : num 1.44 1.44 1.52 1.61 1.65 ...
## $ CDK5_N : num 0.295 0.294 0.302 0.296 0.297 ...
## $ S6_N : num 0.355 0.355 0.386 0.291 0.309 ...
## $ ADARB1_N : num 1.34 1.31 1.28 1.2 1.21 ...
## $ AcetylH3K9_N : num 0.17 0.171 0.185 0.16 0.165 ...
## $ RRP1_N : num 0.159 0.158 0.149 0.166 0.161 ...
## $ BAX_N : num 0.189 0.185 0.191 0.185 0.188 ...
## $ ARC_N : num 0.106 0.107 0.108 0.103 0.105 ...
## $ ERBB4_N : num 0.145 0.15 0.145 0.141 0.142 ...
## $ nNOS_N : num 0.177 0.178 0.176 0.164 0.168 ...
## $ Tau_N : num 0.125 0.134 0.133 0.123 0.137 ...
## $ GFAP_N : num 0.115 0.118 0.118 0.117 0.116 ...
## $ GluR3_N : num 0.228 0.238 0.245 0.235 0.256 ...
## $ GluR4_N : num 0.143 0.142 0.142 0.145 0.141 ...
## $ IL1B_N : num 0.431 0.457 0.51 0.431 0.481 ...
## $ P3525_N : num 0.248 0.258 0.255 0.251 0.252 ...
## $ pCASP9_N : num 1.6 1.67 1.66 1.48 1.53 ...
## $ PSD95_N : num 2.01 2 2.02 1.96 2.01 ...
## $ SNCA_N : num 0.108 0.11 0.108 0.12 0.12 ...
## $ Ubiquitin_N : num 1.045 1.01 0.997 0.99 0.998 ...
## $ pGSK3B_Tyr216_N: num 0.832 0.849 0.847 0.833 0.879 ...
## $ SHH_N : num 0.189 0.2 0.194 0.192 0.206 ...
## $ BAD_N : num 0.123 0.117 0.119 0.133 0.13 ...
## $ BCL2_N : num NA NA NA NA NA NA NA NA NA NA ...
## $ pS6_N : num 0.106 0.107 0.108 0.103 0.105 ...
## $ pCFOS_N : num 0.108 0.104 0.106 0.111 0.111 ...
## $ SYP_N : num 0.427 0.442 0.436 0.392 0.434 ...
## $ H3AcK18_N : num 0.115 0.112 0.112 0.13 0.118 ...
## $ EGR1_N : num 0.132 0.135 0.133 0.147 0.14 ...
## $ H3MeK4_N : num 0.128 0.131 0.127 0.147 0.148 ...
## $ CaNA_N : num 1.68 1.74 1.93 1.7 1.84 ...
## $ Genotype : Factor w/ 2 levels "Control","Ts65Dn": 1 1 1 1 1 1 1 1 1 1 ...
## $ Treatment : Factor w/ 2 levels "Memantine","Saline": 1 1 1 1 1 1 1 1 1 1 ...
## $ Behavior : Factor w/ 2 levels "C/S","S/C": 1 1 1 1 1 1 1 1 1 1 ...
## $ class : Factor w/ 8 levels "c-CS-m","c-CS-s",..: 1 1 1 1 1 1 1 1 1 1 ...
NA values:
colSums(is.na(data_mice))
## MouseID DYRK1A_N ITSN1_N BDNF_N NR1_N
## 0 3 3 3 3
## NR2A_N pAKT_N pBRAF_N pCAMKII_N pCREB_N
## 3 3 3 3 3
## pELK_N pERK_N pJNK_N PKCA_N pMEK_N
## 3 3 3 3 3
## pNR1_N pNR2A_N pNR2B_N pPKCAB_N pRSK_N
## 3 3 3 3 3
## AKT_N BRAF_N CAMKII_N CREB_N ELK_N
## 3 3 3 3 18
## ERK_N GSK3B_N JNK_N MEK_N TRKA_N
## 3 3 3 7 3
## RSK_N APP_N Bcatenin_N SOD1_N MTOR_N
## 3 3 18 3 3
## P38_N pMTOR_N DSCR1_N AMPKA_N NR2B_N
## 3 3 3 3 3
## pNUMB_N RAPTOR_N TIAM1_N pP70S6_N NUMB_N
## 3 3 3 3 0
## P70S6_N pGSK3B_N pPKCG_N CDK5_N S6_N
## 0 0 0 0 0
## ADARB1_N AcetylH3K9_N RRP1_N BAX_N ARC_N
## 0 0 0 0 0
## ERBB4_N nNOS_N Tau_N GFAP_N GluR3_N
## 0 0 0 0 0
## GluR4_N IL1B_N P3525_N pCASP9_N PSD95_N
## 0 0 0 0 0
## SNCA_N Ubiquitin_N pGSK3B_Tyr216_N SHH_N BAD_N
## 0 0 0 0 213
## BCL2_N pS6_N pCFOS_N SYP_N H3AcK18_N
## 285 0 75 0 180
## EGR1_N H3MeK4_N CaNA_N Genotype Treatment
## 210 270 0 0 0
## Behavior class
## 0 0
Number of observations by groups:
data_mice$class <- as.factor(data_mice$class)
#Число переменных в каждой группе, выделенной в зависимости от class
summary(data_mice$class) #или table(data_mice$class)
## c-CS-m c-CS-s c-SC-m c-SC-s t-CS-m t-CS-s t-SC-m t-SC-s
## 150 135 150 135 135 105 135 135
There is a small variation in the number of observations in groups. The groups are not balanced.
It can be seen that the number of variables in the groups is not the same.
summary(data_mice$BDNF_N ~ data_mice$class)
## data_mice$BDNF_N N= 1077 , 3 Missing
##
## +---------------+------+----+----------------+
## | | |N |data_mice$BDNF_N|
## +---------------+------+----+----------------+
## |data_mice$class|c-CS-m| 150|0.3392174 |
## | |c-CS-s| 135|0.3423153 |
## | |c-SC-m| 150|0.2909458 |
## | |c-SC-s| 135|0.3133925 |
## | |t-CS-m| 135|0.3127322 |
## | |t-CS-s| 105|0.3054604 |
## | |t-SC-m| 135|0.3210633 |
## | |t-SC-s| 132|0.3255864 |
## +---------------+------+----+----------------+
## |Overall | |1077|0.3190884 |
## +---------------+------+----+----------------+
There are three NA values in the BDNF_N column (in the t-CS-s subgroup).
theme_set(theme_bw())
ggplot(data_mice, aes(class, BDNF_N, color = class)) + stat_summary(fun.data = "mean_cl_normal") + ggtitle(label = "BDNF_N")
Picture 1 Graph of BDNF_N dependence on the “class” variable.
Let’s check if there are differences in the level of BDNF_N production depending on the class in the experiment using Anova.
We can use Anova if: -Normal distribution of residuals -Сonstancy of residuals variance -Observation independence (no multicollinearity) -No influential observations
Linear model:
mod_mice <- lm(BDNF_N ~ Genotype*Treatment*Behavior, data = data_mice, contrasts = list(Genotype = contr.sum, Treatment = contr.sum, Behavior = contr.sum))
summary(mod_mice)
##
## Call:
## lm(formula = BDNF_N ~ Genotype * Treatment * Behavior, data = data_mice,
## contrasts = list(Genotype = contr.sum, Treatment = contr.sum,
## Behavior = contr.sum))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.175764 -0.028777 -0.001609 0.028701 0.159388
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3188392 0.0014321 222.636 < 2e-16 ***
## Genotype1 0.0026286 0.0014321 1.835 0.06672 .
## Treatment1 -0.0028495 0.0014321 -1.990 0.04688 *
## Behavior1 0.0060922 0.0014321 4.254 2.28e-05 ***
## Genotype1:Treatment1 -0.0035367 0.0014321 -2.470 0.01368 *
## Genotype1:Behavior1 0.0132064 0.0014321 9.222 < 2e-16 ***
## Treatment1:Behavior1 0.0038930 0.0014321 2.718 0.00667 **
## Genotype1:Treatment1:Behavior1 0.0009442 0.0014321 0.659 0.50982
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04675 on 1069 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.1097, Adjusted R-squared: 0.1039
## F-statistic: 18.82 on 7 and 1069 DF, p-value: < 2.2e-16
–Observation independence (no multicollinearity): Let’s calculate the value Variance inflation factor (VIF).
vif(mod_mice)
## Genotype Treatment
## 1.007281 1.007281
## Behavior Genotype:Treatment
## 1.010105 1.010732
## Genotype:Behavior Treatment:Behavior
## 1.010105 1.010105
## Genotype:Treatment:Behavior
## 1.010105
There is no multicollinearity. All VIF values less than 2.
–No influential observations (Picture 1)
theme_set(theme_bw())
mod_diag <- fortify(mod_mice)
ggplot(mod_diag, aes(x = 1:nrow(mod_diag), y = .cooksd)) + geom_bar(stat = "identity") + ggtitle("Graph of Cook's distances")
Picture 1 Graph of Cook’s distances
None of the values exceed the conditional threshold of 2 units. No influential observations.
–Normal distribution of residues (Picture 2)
qqPlot(mod_diag$.stdresid, main = 'Graph Q-Q')
## [1] 182 918
Picture 2 Q-Q graph. This is normal distribution.
ANOVA
mice_anova <- Anova(mod_mice, type = "III")
mice_anova
## Anova Table (Type III tests)
##
## Response: BDNF_N
## Sum Sq Df F value Pr(>F)
## (Intercept) 108.323 1 49566.6260 < 2.2e-16 ***
## Genotype 0.007 1 3.3689 0.066715 .
## Treatment 0.009 1 3.9590 0.046877 *
## Behavior 0.040 1 18.0964 2.284e-05 ***
## Genotype:Treatment 0.013 1 6.0987 0.013684 *
## Genotype:Behavior 0.186 1 85.0389 < 2.2e-16 ***
## Treatment:Behavior 0.016 1 7.3894 0.006667 **
## Genotype:Treatment:Behavior 0.001 1 0.4347 0.509820
## Residuals 2.336 1069
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can notice that BDNF_N significantly depends on Treatment and Behavior. There is also a significant interaction of factors. Let us apply post hoc criteria to identify which groups there are differences in the average protein level.
data_mice$inter <- interaction(data_mice$Genotype, data_mice$Treatment, data_mice$Behavior)
mod_inter <- lm(BDNF_N ~ -1 + inter, data = data_mice)
data_tukey <- glht(mod_inter, linfct = mcp(inter= 'Tukey') )
summary(data_tukey)
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: lm(formula = BDNF_N ~ -1 + inter, data = data_mice)
##
## Linear Hypotheses:
## Estimate Std. Error
## Ts65Dn.Memantine.C/S - Control.Memantine.C/S == 0 -0.0264852 0.0055459
## Control.Saline.C/S - Control.Memantine.C/S == 0 0.0030979 0.0055459
## Ts65Dn.Saline.C/S - Control.Memantine.C/S == 0 -0.0337570 0.0059483
## Control.Memantine.S/C - Control.Memantine.C/S == 0 -0.0482717 0.0053980
## Ts65Dn.Memantine.S/C - Control.Memantine.C/S == 0 -0.0181541 0.0055459
## Control.Saline.S/C - Control.Memantine.C/S == 0 -0.0258249 0.0055459
## Ts65Dn.Saline.S/C - Control.Memantine.C/S == 0 -0.0136310 0.0055790
## Control.Saline.C/S - Ts65Dn.Memantine.C/S == 0 0.0295831 0.0056900
## Ts65Dn.Saline.C/S - Ts65Dn.Memantine.C/S == 0 -0.0072718 0.0060829
## Control.Memantine.S/C - Ts65Dn.Memantine.C/S == 0 -0.0217865 0.0055459
## Ts65Dn.Memantine.S/C - Ts65Dn.Memantine.C/S == 0 0.0083311 0.0056900
## Control.Saline.S/C - Ts65Dn.Memantine.C/S == 0 0.0006603 0.0056900
## Ts65Dn.Saline.S/C - Ts65Dn.Memantine.C/S == 0 0.0128542 0.0057223
## Ts65Dn.Saline.C/S - Control.Saline.C/S == 0 -0.0368549 0.0060829
## Control.Memantine.S/C - Control.Saline.C/S == 0 -0.0513696 0.0055459
## Ts65Dn.Memantine.S/C - Control.Saline.C/S == 0 -0.0212520 0.0056900
## Control.Saline.S/C - Control.Saline.C/S == 0 -0.0289228 0.0056900
## Ts65Dn.Saline.S/C - Control.Saline.C/S == 0 -0.0167289 0.0057223
## Control.Memantine.S/C - Ts65Dn.Saline.C/S == 0 -0.0145147 0.0059483
## Ts65Dn.Memantine.S/C - Ts65Dn.Saline.C/S == 0 0.0156029 0.0060829
## Control.Saline.S/C - Ts65Dn.Saline.C/S == 0 0.0079321 0.0060829
## Ts65Dn.Saline.S/C - Ts65Dn.Saline.C/S == 0 0.0201260 0.0061130
## Ts65Dn.Memantine.S/C - Control.Memantine.S/C == 0 0.0301176 0.0055459
## Control.Saline.S/C - Control.Memantine.S/C == 0 0.0224468 0.0055459
## Ts65Dn.Saline.S/C - Control.Memantine.S/C == 0 0.0346406 0.0055790
## Control.Saline.S/C - Ts65Dn.Memantine.S/C == 0 -0.0076708 0.0056900
## Ts65Dn.Saline.S/C - Ts65Dn.Memantine.S/C == 0 0.0045231 0.0057223
## Ts65Dn.Saline.S/C - Control.Saline.S/C == 0 0.0121939 0.0057223
## t value Pr(>|t|)
## Ts65Dn.Memantine.C/S - Control.Memantine.C/S == 0 -4.776 < 0.001 ***
## Control.Saline.C/S - Control.Memantine.C/S == 0 0.559 0.99930
## Ts65Dn.Saline.C/S - Control.Memantine.C/S == 0 -5.675 < 0.001 ***
## Control.Memantine.S/C - Control.Memantine.C/S == 0 -8.942 < 0.001 ***
## Ts65Dn.Memantine.S/C - Control.Memantine.C/S == 0 -3.273 0.02416 *
## Control.Saline.S/C - Control.Memantine.C/S == 0 -4.657 < 0.001 ***
## Ts65Dn.Saline.S/C - Control.Memantine.C/S == 0 -2.443 0.22127
## Control.Saline.C/S - Ts65Dn.Memantine.C/S == 0 5.199 < 0.001 ***
## Ts65Dn.Saline.C/S - Ts65Dn.Memantine.C/S == 0 -1.195 0.93313
## Control.Memantine.S/C - Ts65Dn.Memantine.C/S == 0 -3.928 0.00245 **
## Ts65Dn.Memantine.S/C - Ts65Dn.Memantine.C/S == 0 1.464 0.82608
## Control.Saline.S/C - Ts65Dn.Memantine.C/S == 0 0.116 1.00000
## Ts65Dn.Saline.S/C - Ts65Dn.Memantine.C/S == 0 2.246 0.32444
## Ts65Dn.Saline.C/S - Control.Saline.C/S == 0 -6.059 < 0.001 ***
## Control.Memantine.S/C - Control.Saline.C/S == 0 -9.263 < 0.001 ***
## Ts65Dn.Memantine.S/C - Control.Saline.C/S == 0 -3.735 0.00468 **
## Control.Saline.S/C - Control.Saline.C/S == 0 -5.083 < 0.001 ***
## Ts65Dn.Saline.S/C - Control.Saline.C/S == 0 -2.923 0.06852 .
## Control.Memantine.S/C - Ts65Dn.Saline.C/S == 0 -2.440 0.22320
## Ts65Dn.Memantine.S/C - Ts65Dn.Saline.C/S == 0 2.565 0.16959
## Control.Saline.S/C - Ts65Dn.Saline.C/S == 0 1.304 0.89727
## Ts65Dn.Saline.S/C - Ts65Dn.Saline.C/S == 0 3.292 0.02274 *
## Ts65Dn.Memantine.S/C - Control.Memantine.S/C == 0 5.431 < 0.001 ***
## Control.Saline.S/C - Control.Memantine.S/C == 0 4.047 0.00156 **
## Ts65Dn.Saline.S/C - Control.Memantine.S/C == 0 6.209 < 0.001 ***
## Control.Saline.S/C - Ts65Dn.Memantine.S/C == 0 -1.348 0.87979
## Ts65Dn.Saline.S/C - Ts65Dn.Memantine.S/C == 0 0.790 0.99360
## Ts65Dn.Saline.S/C - Control.Saline.S/C == 0 2.131 0.39477
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
Visualization of post hoc analysis:
theme_set(theme_bw())
MyData <- expand.grid(Genotype = levels(data_mice$Genotype), Treatment =levels(data_mice$Treatment), Behavior = levels(data_mice$Behavior))
MyData <- data.frame(
MyData,
predict(mod_mice, newdata = MyData, interval = 'confidence')
)
pos <- position_dodge(width = 0.2)
gg_linep <- ggplot(data = MyData, aes(x = Treatment, y = fit,
ymin = lwr, ymax = upr, colour = Genotype)) +
geom_point(position = pos) +
geom_errorbar(position = pos, width =0.2 )
gg_linep
Picture 3 Influence of Treatment to BDNF_N level depending on the Genotype.
theme_set(theme_bw())
MyData <- expand.grid(Genotype = levels(data_mice$Genotype), Treatment =levels(data_mice$Treatment), Behavior = levels(data_mice$Behavior))
MyData <- data.frame(
MyData,
predict(mod_mice, newdata = MyData, interval = 'confidence')
)
pos <- position_dodge(width = 0.2)
gg_linep <- ggplot(data = MyData, aes(x = Behavior, y = fit,
ymin = lwr, ymax = upr, colour = Genotype)) +
geom_point(position = pos) +
geom_errorbar(position = pos, width =0.2 )
gg_linep
Picture 4 Influence of Behavior to BDNF_N level depending on the Genotype.
theme_set(theme_bw())
MyData <- expand.grid(Genotype = levels(data_mice$Genotype), Treatment =levels(data_mice$Treatment), Behavior = levels(data_mice$Behavior))
MyData <- data.frame(
MyData,
predict(mod_mice, newdata = MyData, interval = 'confidence')
)
pos <- position_dodge(width = 0.2)
gg_linep <- ggplot(data = MyData, aes(x = Behavior, y = fit,
ymin = lwr, ymax = upr, colour = Treatment)) +
geom_point(position = pos) +
geom_errorbar(position = pos, width =0.2 )
gg_linep
Picture 5 Influence of Behavior to BDNF_N level depending on the Treatment.
Conclusions: 1.Ts65Dn.Memantine.C/S - Control.Memantine.C/S == 0 Memantine has different effects on BDNF_N of healthy mice and mice with Down syndrome. 2.Control.Memantine.S/C - Control.Memantine.C/S == 0 Context fear conditioning has an effect on BDNF_N even in healthy mice. 3.Ts65Dn.Saline.C/S - Control.Saline.C/S == 0 BDNF_N are really significantly different in healthy mice and mice with Down syndrome. 4.Ts65Dn.Saline.S/C - Ts65Dn.Saline.C/S == 0 Context fear conditioning has an effect on BDNF_N in mice with Down syndrome. 5.Ts65Dn.Memantine.S/C - Control.Memantine.S/C == 0 Memantine affects BDNF_N protein regardless of сontext fear conditioning. 6.Control.Saline.S/C - Control.Memantine.S/C == 0 The choice of drug affects on BDNF_N even in healthy mice.
Build a linear model capable of predicting the level of production of the ERBB4_N protein based on data on other proteins in the experiment
Leave only numeric variables.
data_mice_num <- select_if(data_mice, is.numeric)
fit <- lm(ERBB4_N ~ ., data = data_mice_num)
summary(fit)
##
## Call:
## lm(formula = ERBB4_N ~ ., data = data_mice_num)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.023037 -0.003370 -0.000127 0.003071 0.031955
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.566e-02 8.245e-03 3.112 0.001970 **
## DYRK1A_N -6.824e-03 7.475e-03 -0.913 0.361710
## ITSN1_N 8.127e-03 1.058e-02 0.768 0.442776
## BDNF_N 3.833e-02 1.919e-02 1.997 0.046378 *
## NR1_N -1.409e-02 6.685e-03 -2.108 0.035544 *
## NR2A_N -1.049e-04 1.361e-03 -0.077 0.938610
## pAKT_N 7.961e-02 3.044e-02 2.615 0.009203 **
## pBRAF_N -4.103e-02 3.065e-02 -1.339 0.181356
## pCAMKII_N -1.451e-05 7.954e-04 -0.018 0.985451
## pCREB_N -6.015e-02 2.910e-02 -2.067 0.039290 *
## pELK_N 4.731e-05 1.030e-03 0.046 0.963391
## pERK_N -1.987e-02 7.887e-03 -2.519 0.012109 *
## pJNK_N -6.035e-02 2.538e-02 -2.378 0.017814 *
## PKCA_N 8.377e-03 2.736e-02 0.306 0.759625
## pMEK_N 8.498e-03 2.706e-02 0.314 0.753676
## pNR1_N -2.566e-02 1.334e-02 -1.924 0.054968 .
## pNR2A_N 1.004e-02 7.239e-03 1.388 0.165927
## pNR2B_N 1.928e-02 6.659e-03 2.896 0.003959 **
## pPKCAB_N 5.727e-03 2.773e-03 2.065 0.039457 *
## pRSK_N 1.001e-02 1.427e-02 0.701 0.483573
## AKT_N -6.738e-04 8.183e-03 -0.082 0.934413
## BRAF_N 1.550e-02 1.213e-02 1.278 0.201926
## CAMKII_N -3.429e-03 2.011e-02 -0.170 0.864700
## CREB_N -1.450e-02 3.091e-02 -0.469 0.639135
## ELK_N 3.510e-03 3.719e-03 0.944 0.345811
## ERK_N 4.660e-03 2.804e-03 1.662 0.097208 .
## GSK3B_N -5.840e-03 6.824e-03 -0.856 0.392588
## JNK_N -1.188e-02 3.378e-02 -0.352 0.725317
## MEK_N 8.785e-03 2.197e-02 0.400 0.689510
## TRKA_N 5.722e-03 1.646e-02 0.348 0.728257
## RSK_N -2.299e-02 3.915e-02 -0.587 0.557214
## APP_N -5.345e-03 1.825e-02 -0.293 0.769751
## Bcatenin_N 1.107e-03 5.298e-03 0.209 0.834546
## SOD1_N -6.241e-03 3.985e-03 -1.566 0.118018
## MTOR_N 4.161e-02 1.651e-02 2.520 0.012048 *
## P38_N -1.619e-02 1.348e-02 -1.201 0.230267
## pMTOR_N -9.530e-03 7.667e-03 -1.243 0.214475
## DSCR1_N -6.849e-03 1.023e-02 -0.670 0.503374
## AMPKA_N 2.522e-02 2.294e-02 1.099 0.272263
## NR2B_N 7.391e-03 9.146e-03 0.808 0.419449
## pNUMB_N -6.467e-03 2.006e-02 -0.322 0.747326
## RAPTOR_N 4.750e-02 2.465e-02 1.927 0.054629 .
## TIAM1_N -3.284e-02 1.956e-02 -1.679 0.093892 .
## pP70S6_N 2.739e-03 5.307e-03 0.516 0.606022
## NUMB_N -3.304e-02 3.784e-02 -0.873 0.383042
## P70S6_N -4.357e-03 5.119e-03 -0.851 0.395118
## pGSK3B_N 1.291e-01 4.063e-02 3.179 0.001577 **
## pPKCG_N -7.636e-03 1.962e-03 -3.893 0.000113 ***
## CDK5_N 7.111e-04 9.807e-03 0.073 0.942225
## S6_N 1.566e-02 7.647e-03 2.049 0.041059 *
## ADARB1_N -2.609e-03 2.040e-03 -1.279 0.201557
## AcetylH3K9_N 2.113e-03 1.103e-02 0.192 0.848199
## RRP1_N -1.488e-02 1.038e-02 -1.434 0.152232
## BAX_N -1.304e-01 3.542e-02 -3.680 0.000260 ***
## ARC_N 1.687e-01 6.748e-02 2.499 0.012775 *
## nNOS_N -4.749e-03 2.854e-02 -0.166 0.867918
## Tau_N 7.082e-02 1.810e-02 3.913 0.000104 ***
## GFAP_N -4.703e-02 5.400e-02 -0.871 0.384291
## GluR3_N -7.133e-03 2.436e-02 -0.293 0.769812
## GluR4_N -7.030e-02 3.373e-02 -2.084 0.037672 *
## IL1B_N 2.845e-02 1.081e-02 2.631 0.008794 **
## P3525_N 4.994e-02 2.423e-02 2.061 0.039810 *
## pCASP9_N 8.118e-03 3.044e-03 2.667 0.007923 **
## PSD95_N 2.379e-02 3.811e-03 6.242 9.55e-10 ***
## SNCA_N -1.137e-02 3.484e-02 -0.326 0.744195
## Ubiquitin_N 2.330e-03 4.883e-03 0.477 0.633468
## pGSK3B_Tyr216_N 2.808e-02 1.006e-02 2.793 0.005439 **
## SHH_N 4.904e-02 1.998e-02 2.454 0.014471 *
## BAD_N -3.592e-02 3.508e-02 -1.024 0.306311
## BCL2_N 6.173e-03 3.112e-02 0.198 0.842840
## pS6_N NA NA NA NA
## pCFOS_N -1.232e-02 3.111e-02 -0.396 0.692240
## SYP_N 1.354e-02 1.079e-02 1.254 0.210395
## H3AcK18_N 2.401e-03 2.527e-02 0.095 0.924349
## EGR1_N -6.016e-03 2.619e-02 -0.230 0.818406
## H3MeK4_N 1.227e-02 2.154e-02 0.570 0.569097
## CaNA_N -1.749e-03 3.419e-03 -0.512 0.609159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.005885 on 476 degrees of freedom
## (528 observations deleted due to missingness)
## Multiple R-squared: 0.8641, Adjusted R-squared: 0.8427
## F-statistic: 40.37 on 75 and 476 DF, p-value: < 2.2e-16
The variable pS6_N is ideally collinear with ERBB4_N, its impact cannot be estimated. Let’s exclude it from the model.
data_mice_num <- data_mice_num[,-71]
fit <- lm(ERBB4_N ~ ., data = data_mice_num)
summary(fit)
##
## Call:
## lm(formula = ERBB4_N ~ ., data = data_mice_num)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.023037 -0.003370 -0.000127 0.003071 0.031955
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.566e-02 8.245e-03 3.112 0.001970 **
## DYRK1A_N -6.824e-03 7.475e-03 -0.913 0.361710
## ITSN1_N 8.127e-03 1.058e-02 0.768 0.442776
## BDNF_N 3.833e-02 1.919e-02 1.997 0.046378 *
## NR1_N -1.409e-02 6.685e-03 -2.108 0.035544 *
## NR2A_N -1.049e-04 1.361e-03 -0.077 0.938610
## pAKT_N 7.961e-02 3.044e-02 2.615 0.009203 **
## pBRAF_N -4.103e-02 3.065e-02 -1.339 0.181356
## pCAMKII_N -1.451e-05 7.954e-04 -0.018 0.985451
## pCREB_N -6.015e-02 2.910e-02 -2.067 0.039290 *
## pELK_N 4.731e-05 1.030e-03 0.046 0.963391
## pERK_N -1.987e-02 7.887e-03 -2.519 0.012109 *
## pJNK_N -6.035e-02 2.538e-02 -2.378 0.017814 *
## PKCA_N 8.377e-03 2.736e-02 0.306 0.759625
## pMEK_N 8.498e-03 2.706e-02 0.314 0.753676
## pNR1_N -2.566e-02 1.334e-02 -1.924 0.054968 .
## pNR2A_N 1.004e-02 7.239e-03 1.388 0.165927
## pNR2B_N 1.928e-02 6.659e-03 2.896 0.003959 **
## pPKCAB_N 5.727e-03 2.773e-03 2.065 0.039457 *
## pRSK_N 1.001e-02 1.427e-02 0.701 0.483573
## AKT_N -6.738e-04 8.183e-03 -0.082 0.934413
## BRAF_N 1.550e-02 1.213e-02 1.278 0.201926
## CAMKII_N -3.429e-03 2.011e-02 -0.170 0.864700
## CREB_N -1.450e-02 3.091e-02 -0.469 0.639135
## ELK_N 3.510e-03 3.719e-03 0.944 0.345811
## ERK_N 4.660e-03 2.804e-03 1.662 0.097208 .
## GSK3B_N -5.840e-03 6.824e-03 -0.856 0.392588
## JNK_N -1.188e-02 3.378e-02 -0.352 0.725317
## MEK_N 8.785e-03 2.197e-02 0.400 0.689510
## TRKA_N 5.722e-03 1.646e-02 0.348 0.728257
## RSK_N -2.299e-02 3.915e-02 -0.587 0.557214
## APP_N -5.345e-03 1.825e-02 -0.293 0.769751
## Bcatenin_N 1.107e-03 5.298e-03 0.209 0.834546
## SOD1_N -6.241e-03 3.985e-03 -1.566 0.118018
## MTOR_N 4.161e-02 1.651e-02 2.520 0.012048 *
## P38_N -1.619e-02 1.348e-02 -1.201 0.230267
## pMTOR_N -9.530e-03 7.667e-03 -1.243 0.214475
## DSCR1_N -6.849e-03 1.023e-02 -0.670 0.503374
## AMPKA_N 2.522e-02 2.294e-02 1.099 0.272263
## NR2B_N 7.391e-03 9.146e-03 0.808 0.419449
## pNUMB_N -6.467e-03 2.006e-02 -0.322 0.747326
## RAPTOR_N 4.750e-02 2.465e-02 1.927 0.054629 .
## TIAM1_N -3.284e-02 1.956e-02 -1.679 0.093892 .
## pP70S6_N 2.739e-03 5.307e-03 0.516 0.606022
## NUMB_N -3.304e-02 3.784e-02 -0.873 0.383042
## P70S6_N -4.357e-03 5.119e-03 -0.851 0.395118
## pGSK3B_N 1.291e-01 4.063e-02 3.179 0.001577 **
## pPKCG_N -7.636e-03 1.962e-03 -3.893 0.000113 ***
## CDK5_N 7.111e-04 9.807e-03 0.073 0.942225
## S6_N 1.566e-02 7.647e-03 2.049 0.041059 *
## ADARB1_N -2.609e-03 2.040e-03 -1.279 0.201557
## AcetylH3K9_N 2.113e-03 1.103e-02 0.192 0.848199
## RRP1_N -1.488e-02 1.038e-02 -1.434 0.152232
## BAX_N -1.304e-01 3.542e-02 -3.680 0.000260 ***
## ARC_N 1.687e-01 6.748e-02 2.499 0.012775 *
## nNOS_N -4.749e-03 2.854e-02 -0.166 0.867918
## Tau_N 7.082e-02 1.810e-02 3.913 0.000104 ***
## GFAP_N -4.703e-02 5.400e-02 -0.871 0.384291
## GluR3_N -7.133e-03 2.436e-02 -0.293 0.769812
## GluR4_N -7.030e-02 3.373e-02 -2.084 0.037672 *
## IL1B_N 2.845e-02 1.081e-02 2.631 0.008794 **
## P3525_N 4.994e-02 2.423e-02 2.061 0.039810 *
## pCASP9_N 8.118e-03 3.044e-03 2.667 0.007923 **
## PSD95_N 2.379e-02 3.811e-03 6.242 9.55e-10 ***
## SNCA_N -1.137e-02 3.484e-02 -0.326 0.744195
## Ubiquitin_N 2.330e-03 4.883e-03 0.477 0.633468
## pGSK3B_Tyr216_N 2.808e-02 1.006e-02 2.793 0.005439 **
## SHH_N 4.904e-02 1.998e-02 2.454 0.014471 *
## BAD_N -3.592e-02 3.508e-02 -1.024 0.306311
## BCL2_N 6.173e-03 3.112e-02 0.198 0.842840
## pCFOS_N -1.232e-02 3.111e-02 -0.396 0.692240
## SYP_N 1.354e-02 1.079e-02 1.254 0.210395
## H3AcK18_N 2.401e-03 2.527e-02 0.095 0.924349
## EGR1_N -6.016e-03 2.619e-02 -0.230 0.818406
## H3MeK4_N 1.227e-02 2.154e-02 0.570 0.569097
## CaNA_N -1.749e-03 3.419e-03 -0.512 0.609159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.005885 on 476 degrees of freedom
## (528 observations deleted due to missingness)
## Multiple R-squared: 0.8641, Adjusted R-squared: 0.8427
## F-statistic: 40.37 on 75 and 476 DF, p-value: < 2.2e-16
Let’s diagnose the constructed linear model. Multicollinearity:
sqrt(vif(fit))
## DYRK1A_N ITSN1_N BDNF_N NR1_N NR2A_N
## 4.872023 7.711915 4.082843 9.813852 5.012776
## pAKT_N pBRAF_N pCAMKII_N pCREB_N pELK_N
## 4.677621 3.109888 4.387688 4.173886 1.607783
## pERK_N pJNK_N PKCA_N pMEK_N pNR1_N
## 7.465857 5.148647 6.217226 4.861493 6.661844
## pNR2A_N pNR2B_N pPKCAB_N pRSK_N AKT_N
## 5.749259 7.471714 5.747702 3.989231 4.241897
## BRAF_N CAMKII_N CREB_N ELK_N ERK_N
## 6.456818 4.007758 3.161083 5.463030 7.432996
## GSK3B_N JNK_N MEK_N TRKA_N RSK_N
## 5.728862 4.358639 3.953392 8.513574 4.029508
## APP_N Bcatenin_N SOD1_N MTOR_N P38_N
## 4.643652 9.591355 4.484240 4.226200 4.466973
## pMTOR_N DSCR1_N AMPKA_N NR2B_N pNUMB_N
## 4.231519 3.816764 5.206541 3.319314 4.479676
## RAPTOR_N TIAM1_N pP70S6_N NUMB_N P70S6_N
## 4.373510 4.668370 3.128047 4.845890 3.676441
## pGSK3B_N pPKCG_N CDK5_N S6_N ADARB1_N
## 3.478913 4.695966 1.725275 4.183738 3.055476
## AcetylH3K9_N RRP1_N BAX_N ARC_N nNOS_N
## 6.540512 1.310133 2.840968 3.824210 3.096720
## Tau_N GFAP_N GluR3_N GluR4_N IL1B_N
## 4.222491 2.328700 3.273998 2.829308 3.094368
## P3525_N pCASP9_N PSD95_N SNCA_N Ubiquitin_N
## 2.810073 3.013767 3.808100 3.048122 3.245402
## pGSK3B_Tyr216_N SHH_N BAD_N BCL2_N pCFOS_N
## 3.369730 2.258544 3.662395 2.912133 2.522871
## SYP_N H3AcK18_N EGR1_N H3MeK4_N CaNA_N
## 3.070787 6.027909 3.739159 3.911425 4.442898
There is great collinearity between variables.
Resudials graph (Picture 6):
theme_set(theme_bw())
model_diag <- fortify(fit)
ggplot(data = model_diag, aes(x = .fitted, y = .stdresid)) +
geom_point() +
geom_hline(yintercept = 0) +
geom_smooth() +
geom_hline(yintercept = 2, color = "red") +
geom_hline(yintercept = -2, color = "red")
Picture 6 The resudials graph.
There is no pattern.
Graph of Cook’s distances (Picture 7):
theme_set(theme_bw())
ggplot(mod_diag, aes(x = 1:nrow(mod_diag), y = .cooksd)) + geom_bar(stat = "identity") + ggtitle("Graph of Cook's distances")
Picture 7 Graph of Cook’s distances
There is no influential observations.
Normal distribution of residues (Picture 8):
qqPlot(model_diag$.stdresid, main = 'Graph Q-Q')
## [1] 37 389
Picture 8 Q-Q graph.
There is a deviation from the normal distribution.
Total, this is a bad model in all respects. It is necessary to exclude multicollinearity between variables.
Delete all NA values.
data_mice_num_na <- na.omit(data_mice_num)
mice_pca <- rda(data_mice_num_na, center = TRUE, scale = TRUE)
head(summary(mice_pca))
##
## Call:
## rda(X = data_mice_num_na, scale = TRUE, center = TRUE)
##
## Partitioning of correlations:
## Inertia Proportion
## Total 76 1
## Unconstrained 76 1
##
## Eigenvalues, and their contribution to the correlations
##
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Eigenvalue 22.8092 13.0296 7.58645 6.89765 3.28216 3.02896 2.3636
## Proportion Explained 0.3001 0.1714 0.09982 0.09076 0.04319 0.03985 0.0311
## Cumulative Proportion 0.3001 0.4716 0.57139 0.66214 0.70533 0.74519 0.7763
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Eigenvalue 2.25184 1.80282 1.27298 1.10782 0.97653 0.84755 0.700418
## Proportion Explained 0.02963 0.02372 0.01675 0.01458 0.01285 0.01115 0.009216
## Cumulative Proportion 0.80591 0.82964 0.84639 0.86096 0.87381 0.88496 0.894179
## PC15 PC16 PC17 PC18 PC19 PC20
## Eigenvalue 0.597104 0.567936 0.520420 0.483436 0.414560 0.395137
## Proportion Explained 0.007857 0.007473 0.006848 0.006361 0.005455 0.005199
## Cumulative Proportion 0.902036 0.909509 0.916357 0.922718 0.928172 0.933371
## PC21 PC22 PC23 PC24 PC25 PC26
## Eigenvalue 0.376634 0.331066 0.318904 0.291320 0.256528 0.241724
## Proportion Explained 0.004956 0.004356 0.004196 0.003833 0.003375 0.003181
## Cumulative Proportion 0.938327 0.942683 0.946879 0.950713 0.954088 0.957268
## PC27 PC28 PC29 PC30 PC31 PC32
## Eigenvalue 0.215677 0.205354 0.1824 0.165090 0.150582 0.135747
## Proportion Explained 0.002838 0.002702 0.0024 0.002172 0.001981 0.001786
## Cumulative Proportion 0.960106 0.962808 0.9652 0.967381 0.969362 0.971149
## PC33 PC34 PC35 PC36 PC37 PC38
## Eigenvalue 0.131546 0.124803 0.121780 0.106629 0.100205 0.094696
## Proportion Explained 0.001731 0.001642 0.001602 0.001403 0.001318 0.001246
## Cumulative Proportion 0.972879 0.974522 0.976124 0.977527 0.978845 0.980091
## PC39 PC40 PC41 PC42 PC43 PC44
## Eigenvalue 0.09118 0.087324 0.080080 0.0755388 0.0725867 0.071899
## Proportion Explained 0.00120 0.001149 0.001054 0.0009939 0.0009551 0.000946
## Cumulative Proportion 0.98129 0.982440 0.983494 0.9844878 0.9854429 0.986389
## PC45 PC46 PC47 PC48 PC49
## Eigenvalue 0.0678445 0.0643947 0.0621993 0.0570225 0.0543054
## Proportion Explained 0.0008927 0.0008473 0.0008184 0.0007503 0.0007145
## Cumulative Proportion 0.9872816 0.9881289 0.9889473 0.9896976 0.9904121
## PC50 PC51 PC52 PC53 PC54
## Eigenvalue 0.0514765 0.0494509 0.0466105 0.0455651 0.0443141
## Proportion Explained 0.0006773 0.0006507 0.0006133 0.0005995 0.0005831
## Cumulative Proportion 0.9910895 0.9917401 0.9923534 0.9929530 0.9935360
## PC55 PC56 PC57 PC58 PC59
## Eigenvalue 0.0410658 0.0401352 0.0362324 0.0329131 0.0321704
## Proportion Explained 0.0005403 0.0005281 0.0004767 0.0004331 0.0004233
## Cumulative Proportion 0.9940764 0.9946045 0.9950812 0.9955143 0.9959376
## PC60 PC61 PC62 PC63 PC64 PC65
## Eigenvalue 0.0312727 0.0306336 0.02812 0.0268910 0.0238907 0.0219182
## Proportion Explained 0.0004115 0.0004031 0.00037 0.0003538 0.0003144 0.0002884
## Cumulative Proportion 0.9963491 0.9967521 0.99712 0.9974759 0.9977903 0.9980787
## PC66 PC67 PC68 PC69 PC70 PC71
## Eigenvalue 0.0198319 0.0187507 0.0181373 0.01672 0.0138757 0.0124744
## Proportion Explained 0.0002609 0.0002467 0.0002386 0.00022 0.0001826 0.0001641
## Cumulative Proportion 0.9983396 0.9985863 0.9988250 0.99904 0.9992276 0.9993917
## PC72 PC73 PC74 PC75 PC76
## Eigenvalue 0.011552 0.0110707 0.0088824 0.0086210 6.105e-03
## Proportion Explained 0.000152 0.0001457 0.0001169 0.0001134 8.034e-05
## Cumulative Proportion 0.999544 0.9996894 0.9998062 0.9999197 1.000e+00
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 14.30511
##
##
## Species scores
##
## PC1 PC2 PC3 PC4 PC5 PC6
## DYRK1A_N -0.4340 -1.1005 0.16156 -0.5787 0.40956 -0.29336
## ITSN1_N -0.7529 -1.1213 0.02545 -0.3885 0.31662 -0.43080
## BDNF_N -1.4617 -0.2776 0.13861 -0.2754 0.07777 0.02370
## NR1_N -1.4639 -0.2588 0.38581 0.2632 -0.13228 0.06977
## NR2A_N -1.3261 -0.3478 0.56854 0.2773 -0.02636 0.03598
## pAKT_N -1.0057 0.9344 -0.23091 -0.5486 -0.35948 0.03822
## ....
##
##
## Site scores (weighted sums of species scores)
##
## PC1 PC2 PC3 PC4 PC5 PC6
## 76 -0.7992 -0.5924 0.141176 0.5301 0.8206 0.3479
## 77 -0.7345 -0.5226 0.055906 0.5546 1.0258 0.4589
## 78 -0.8737 -0.5139 0.107993 0.4551 1.0148 0.3301
## 79 -0.3142 -0.5933 0.192511 0.2867 0.1504 0.5693
## 80 -0.3688 -0.4951 0.001643 0.3092 0.5272 0.6118
## 81 -0.3718 -0.4844 -0.015590 0.3715 0.4812 0.5097
## ....
Ordination plot using ggplot2 (Picture 9):
df_scores <- data.frame(data_mice_num_na,
scores(mice_pca, display = "sites", choices = c(1, 2, 3), scaling = "sites"))
data_mice_na <- na.omit(data_mice)
df_scores$class <- data_mice_na$class
p_scores <- ggplot(df_scores, aes(x = PC1, y = PC2)) +
geom_point(aes( color = class), alpha = 0.5) +
coord_equal(xlim = c(-1.2, 1.2), ylim = c(-1.2, 1.2)) + ggtitle(label = "Ordination plot") + theme_bw()
p_scores
Picture 9 Ordination plot
Now we can build scree plot of the variance (i.e. sdev^2) for every principle component (Picture 10).
mice2_pca = princomp(data_mice_num_na, center = TRUE, scale = TRUE)
## Warning: In princomp.default(data_mice_num_na, center = TRUE, scale = TRUE) :
## extra arguments 'center', 'scale' will be disregarded
screeplot(mice2_pca, type = "l", npcs = 7, xlab = "Components", ylab = "Variance", main = "Screeplot of the first 7 PCs")
Picture 10 Screeplot of the first 7 PCs.
Also we can build Cumulative variance plot (Picture 11).
plot(cumsum(mice2_pca$sdev^2 / sum(mice2_pca$sdev^2)), type= "b", xlab = "PCs", ylab = "Amount of explained variance", main = "Cumulative variance plot")
abline(v = 3, col="blue", lty=5)
abline(h = 0.995, col="blue", lty=5)
Picture 11 Cumulative variance plot for all PCs.
Contribution of each component (Picture 12):
res.pca <- prcomp(data_mice_num_na, center = T, scale = TRUE)
fviz_eig(res.pca)
Picture 12 Contribution of each component.
We notice that the first 4 explains almost 90% of variance. We can effectively reduce dimensionality from 76 to 4. We can select number of components as 4 (PC1 to PC4) and we can use these 4 components as predictor variables in new model.
data_mice_na <- na.omit(data_mice)
sc <- as.data.frame(mice2_pca$scores)
sc$class <- data_mice_na$class
all_data_mice <- cbind(data_mice_num_na, sc[,1:4])
lmodel <- lm(all_data_mice$ERBB4_N ~ Comp.1 + Comp.2 + Comp.3 + Comp.4 , data = all_data_mice)
summary(lmodel)
##
## Call:
## lm(formula = all_data_mice$ERBB4_N ~ Comp.1 + Comp.2 + Comp.3 +
## Comp.4, data = all_data_mice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.038289 -0.006572 -0.000746 0.006505 0.049965
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1558863 0.0004773 326.596 <2e-16 ***
## Comp.1 0.0042775 0.0002968 14.410 <2e-16 ***
## Comp.2 -0.0001996 0.0003982 -0.501 0.616
## Comp.3 -0.0015160 0.0006147 -2.466 0.014 *
## Comp.4 -0.0144564 0.0010127 -14.275 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01121 on 547 degrees of freedom
## Multiple R-squared: 0.433, Adjusted R-squared: 0.4289
## F-statistic: 104.4 on 4 and 547 DF, p-value: < 2.2e-16
When comparing the two constructed models, we can notice that the Adjusted R-squared coefficient of the second model is better than in the first one.
3D PCA plot (Picture 13):
fig <- plot_ly(sc, x =~Comp.1, y = ~Comp.2, z = ~Comp.3, color = ~class, size = 5)
fig <- fig %>% add_markers()
fig <- fig %>% layout(scene = list(xaxis = list(title = 'PCA1'),
yaxis = list(title = 'PCA2'),
zaxis = list(title = 'PCA3')))
fig
Picture 13 3D PCA plot (P.S. I don’t understand why it can’t be drawn here. But if call “fig” in console, it is work)